A Machine Learning Approach to Building Aligned Wordnets
نویسندگان
چکیده
WordNet is a lexical database describing English words and their senses. We propose a method for automatically producing similar resources for new languages by taking advantage of the original WordNet in conjunction with translation dictionaries. A small set of training mappings is used to learn a model for predicting associations between terms and senses. The associations are represented using a variety of scores that take into account structural properties as well as semantic relatedness and corpus frequency information. For evaluation, we created a German-language wordnet, and the data indicate a significantly better coverage and higher precision than previous heuristics. The resulting resources provide not only valuable information for monolingual NLP tasks but also enable a high degree of cross-lingual interoperability.
منابع مشابه
Coping with Lexical Gaps when Building Aligned Multilingual Wordnets
In this paper we present a methodology for automatically classifying the translation equivalents of a machine readable bilingual dictionary in three main groups: lexical units, lexical gaps (that is cases when a lexical concept of a language does not have a correspondent in the other language) and translation equivalents that need to be manually classified as lexical units or lexical gaps. This...
متن کاملDeveloping Parallel Sense-tagged Corpora with Wordnets
Semantically annotated corpora play an important role in natural language processing. This paper presents the results of a pilot study on building a sense-tagged parallel corpus, part of ongoing construction of aligned corpora for four languages (English, Chinese, Japanese, and Indonesian) in four domains (story, essay, news, and tourism) from the NTU-Multilingual Corpus. Each subcorpus is firs...
متن کاملUsing Multilingual Resources for Building SloWNet Faster
This project report presents the results of an approach in which synsets for Slovene wordnet were induced automatically from parallel corpora and already existing wordnets. First, multilingual lexicons were obtained from word-aligned corpora and compared to the wordnets in various languages in order to disambiguate lexicon entries. Then appropriate synset ids were attached to Slovene entries fr...
متن کاملWordnet creation and extension made simple: A multilingual lexicon-based approach using wiki resources
In this paper, we propose a simple methodology for building or extending wordnets using easily extractible lexical knowledge from Wiktionary and Wikipedia. This method relies on a large multilingual translation/synonym graph in many languages as well as synset-aligned wordnets. It guesses frequent and polysemous literals that are difficult to find using other methods by looking at back-translat...
متن کاملFine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets
The paper presents a method for word sense disambiguation based on parallel corpora. The method exploits recent advances in word alignment and word clustering based on automatic extraction of translation equivalents and being supported by available aligned wordnets for the languages in the corpus. The wordnets are aligned to the Princeton Wordnet, according to the principles established by Euro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007